cient Substring Traversal with Su x Arrays

نویسندگان

  • Toru Kasai
  • Hiroki Arimura
  • Setsuo Arikawa
چکیده

The substring traversal problem is the problem of enumerating all branching substrings appearing in a given text. Although this problem is easily solvable with the su x tree of McCreight (1976), a space e cient and practically fast solution is important. We devise a simple and e cient algorithm that simulates the traversal of the su x tree for a given text with the su x array of Manber and Meyers (1993) and Gonnet, Baeza-Yates, Snider (1992). The algorithm runs in O(n) time and 5n B bulk I/O with the su x array and an additional structure called the height array, while the naive algorithm using binary search on the su x array requires O(n 2 ) time in the worst case. The space requirement 7N bytes of our algorithm is smaller than 15N bytes of the traversal algorithm with the su x tree. A linear time algorthm for computing the height array from the su x and the height arrays is also presented. Computer experiments on real datasets showed that our traversal algorithm with the su x array is an order of magnitude faster than the naive simulation method and comparable to the traversal algorithm with the su x tree.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimal exact string matching based on su x arrays

Using the su x tree of a string S, decision queries of the type \Is P a substring of S?" can be answered in O(jP j) time and enumeration queries of the type \Where are all z occurrences of P in S?" can be answered inO(jP j+z) time, totally independent of the size of S. However, in large scale applications as genome analysis, the space requirements of the su x tree are a severe drawback. The su ...

متن کامل

Generalizations of suffix arrays to multi-dimensional matrices

We propose multi-dimensional index data structures that generalize su x arrays to square matrices and cubic matrices. Giancarlo proposed a two-dimensional index data structure, the Lsu x tree, that generalizes su x trees to square matrices. However, the construction algorithm for Lsu x trees maintains complicated data structures and uses a large amount of space. We present simple and practical ...

متن کامل

Constructing Su x Arrays of Large Texts

Recently, Sadakane [12] proposes a new fast and memory e cient algorithm for sorting su xes of a text in lexicographic order. It is important to sort su xes because an array of indexes of su xes is called sufx array and it is a memory e cient alternative of the su x tree. Sorting su xes is also used for the Burrows-Wheeler transformation in the Block Sorting text compression, therefore fast sor...

متن کامل

A Fast Algorithm for Making Su x Arrays and for Burrows-Wheeler Transformation

We propose a fast and memory e cient algorithm for sorting su xes of a text in lexicographic order. It is important to sort su xes because an array of indexes of su xes is called su x array and it is a memory e cient alternative of the su x tree. Sorting su xes is also used for the Burrows-Wheeler transformation in the Block Sorting text compression, therefore fast sorting algorithms are desire...

متن کامل

A Fast Algorithms for Making Suffix Arrays and for Burrows-Wheeler Transformation

We propose a fast and memory e cient algorithm for sorting su xes of a text in lexicographic order. It is important to sort su xes because an array of indexes of su xes is called su x array and it is a memory e cient alternative of the su x tree. Sorting su xes is also used for the Burrows-Wheeler transformation in the Block Sorting text compression, therefore fast sorting algorithms are desire...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007